We study the construction of coresets for kernel density estimates. That iswe show how to approximate the kernel density estimate described by a largepoint set with another kernel density estimate with a much smaller point set.For characteristic kernels (including Gaussian and Laplace kernels), ourapproximation preserves the $L_\infty$ error between kernel density estimateswithin error $\epsilon$, with coreset size $2/\epsilon^2$, but no other aspectsof the data, including the dimension, the diameter of the point set, or thebandwidth of the kernel common to other approximations. When the dimension isunrestricted, we show this bound is tight for these kernels as well as a muchbroader set. This work provides a careful analysis of the iterative Frank-Wolfe algorithmadapted to this context, an algorithm called \emph{kernel herding}. Thisanalysis unites a broad line of work that spans statistics, machine learning,and geometry. When the dimension $d$ is constant, we demonstrate much tighter bounds on thesize of the coreset specifically for Gaussian kernels, showing that it isbounded by the size of the coreset for axis-aligned rectangles. Currently thebest known constructive bound is $O(\frac{1}{\epsilon} \log^d\frac{1}{\epsilon})$, and non-constructively, this can be improved by$\sqrt{\log \frac{1}{\epsilon}}$. This improves the best constant dimensionbounds polynomially for $d \geq 3$.
展开▼